{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/embeddings/huggingface.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Local Embeddings with HuggingFace\n",
    "\n",
    "LlamaIndex has support for HuggingFace embedding models, including BGE, Instructor, and more.\n",
    "\n",
    "Furthermore, we provide utilties to create and use ONNX models using the [Optimum library](https://huggingface.co/docs/transformers/serialization#exporting-a-transformers-model-to-onnx-with-optimumonnxruntime) from HuggingFace."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## HuggingFaceEmbedding\n",
    "\n",
    "The base `HuggingFaceEmbedding` class is a generic wrapper around any HuggingFace model for embeddings. You can set either `pooling=\"cls\"` or `pooling=\"mean\"` -- in most cases, you'll want `cls` pooling. But the model card for your particular model may have other recommendations.\n",
    "\n",
    "You can refer to the [embeddings leaderboard](https://huggingface.co/spaces/mteb/leaderboard) for more recommendations on embedding models.\n",
    "\n",
    "This class depends on the transformers package, which you can install with `pip install transformers`.\n",
    "\n",
    "NOTE: if you were previously using a `HuggingFaceEmbeddings` from LangChain, this should give equivilant results."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install llama-index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/loganm/miniconda3/envs/llama-index/lib/python3.11/site-packages/torch/cuda/__init__.py:546: UserWarning: Can't initialize NVML\n",
      "  warnings.warn(\"Can't initialize NVML\")\n"
     ]
    }
   ],
   "source": [
    "from llama_index.embeddings import HuggingFaceEmbedding\n",
    "\n",
    "# loads BAAI/bge-small-en\n",
    "# embed_model = HuggingFaceEmbedding()\n",
    "\n",
    "# loads BAAI/bge-small-en-v1.5\n",
    "embed_model = HuggingFaceEmbedding(model_name=\"BAAI/bge-small-en-v1.5\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Hello World!\n",
      "384\n",
      "[-0.030880315229296684, -0.11021008342504501, 0.3917851448059082, -0.35962796211242676, 0.22797748446464539]\n"
     ]
    }
   ],
   "source": [
    "embeddings = embed_model.get_text_embedding(\"Hello World!\")\n",
    "print(len(embeddings))\n",
    "print(embeddings[:5])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## InstructorEmbedding\n",
    "\n",
    "Instructor Embeddings are a class of embeddings specifically trained to augment their embeddings according to an instruction. By default, queries are given `query_instruction=\"Represent the question for retrieving supporting documents: \"` and text is given `text_instruction=\"Represent the document for retrieval: \"`.\n",
    "\n",
    "They rely on the `Instructor` and `SentenceTransformers` pip package, which you can install with `pip install InstructorEmbedding` and `pip install -U sentence-transformers`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/loganm/miniconda3/envs/llama-index/lib/python3.11/site-packages/InstructorEmbedding/instructor.py:7: TqdmExperimentalWarning: Using `tqdm.autonotebook.tqdm` in notebook mode. Use `tqdm.tqdm` instead to force console mode (e.g. in jupyter console)\n",
      "  from tqdm.autonotebook import trange\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "load INSTRUCTOR_Transformer\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/loganm/miniconda3/envs/llama-index/lib/python3.11/site-packages/torch/cuda/__init__.py:546: UserWarning: Can't initialize NVML\n",
      "  warnings.warn(\"Can't initialize NVML\")\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "max_seq_length  512\n"
     ]
    }
   ],
   "source": [
    "from llama_index.embeddings import InstructorEmbedding\n",
    "\n",
    "embed_model = InstructorEmbedding(model_name=\"hkunlp/instructor-base\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "768\n",
      "[ 0.02155361 -0.06098218  0.01796207  0.05490903  0.01526906]\n"
     ]
    }
   ],
   "source": [
    "embeddings = embed_model.get_text_embedding(\"Hello World!\")\n",
    "print(len(embeddings))\n",
    "print(embeddings[:5])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## OptimumEmbedding\n",
    "\n",
    "Optimum in a HuggingFace library for exporting and running HuggingFace models in the ONNX format.\n",
    "\n",
    "You can install the dependencies with `pip install transformers optimum[exporters]`.\n",
    "\n",
    "First, we need to create the ONNX model. ONNX models provide improved inference speeds, and can be used across platforms (i.e. in TransformersJS)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/loganm/miniconda3/envs/llama-index/lib/python3.11/site-packages/torch/cuda/__init__.py:546: UserWarning: Can't initialize NVML\n",
      "  warnings.warn(\"Can't initialize NVML\")\n",
      "Framework not specified. Using pt to export to ONNX.\n",
      "Using the export variant default. Available variants are:\n",
      "\t- default: The default ONNX variant.\n",
      "Using framework PyTorch: 2.0.1+cu117\n",
      "Overriding 1 configuration item(s)\n",
      "\t- use_cache -> False\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "============= Diagnostic Run torch.onnx.export version 2.0.1+cu117 =============\n",
      "verbose: False, log level: Level.ERROR\n",
      "======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================\n",
      "\n",
      "Saved optimum model to ./bge_onnx. Use it with `embed_model = OptimumEmbedding(folder_name='./bge_onnx')`.\n"
     ]
    }
   ],
   "source": [
    "from llama_index.embeddings import OptimumEmbedding\n",
    "\n",
    "OptimumEmbedding.create_and_save_optimum_model(\n",
    "    \"BAAI/bge-small-en-v1.5\", \"./bge_onnx\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "embed_model = OptimumEmbedding(folder_name=\"./bge_onnx\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "384\n",
      "[-0.10364960134029388, -0.20998482406139374, -0.01883639395236969, -0.5241696834564209, 0.0335749015212059]\n"
     ]
    }
   ],
   "source": [
    "embeddings = embed_model.get_text_embedding(\"Hello World!\")\n",
    "print(len(embeddings))\n",
    "print(embeddings[:5])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Benchmarking\n",
    "\n",
    "Let's try comparing using a classic large document -- the IPCC climate report, chapter 3."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n",
      "To disable this warning, you can either:\n",
      "\t- Avoid using `tokenizers` before the fork if possible\n",
      "\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n",
      "  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n",
      "                                 Dload  Upload   Total   Spent    Left  Speed\n",
      "100 20.7M  100 20.7M    0     0  16.5M      0  0:00:01  0:00:01 --:--:-- 16.5M\n"
     ]
    }
   ],
   "source": [
    "!curl https://www.ipcc.ch/report/ar6/wg2/downloads/report/IPCC_AR6_WGII_Chapter03.pdf --output IPCC_AR6_WGII_Chapter03.pdf"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index import VectorStoreIndex, SimpleDirectoryReader, ServiceContext\n",
    "\n",
    "documents = SimpleDirectoryReader(\n",
    "    input_files=[\"IPCC_AR6_WGII_Chapter03.pdf\"]\n",
    ").load_data()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Base HuggingFace Embeddings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import openai\n",
    "\n",
    "# needed to synthesize responses later\n",
    "os.environ[\"OPENAI_API_KEY\"] = \"sk-...\"\n",
    "openai.api_key = os.environ[\"OPENAI_API_KEY\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.embeddings import HuggingFaceEmbedding\n",
    "\n",
    "# loads BAAI/bge-small-en-v1.5\n",
    "embed_model = HuggingFaceEmbedding(model_name=\"BAAI/bge-small-en-v1.5\")\n",
    "test_emeds = embed_model.get_text_embedding(\"Hello World!\")\n",
    "\n",
    "service_context = ServiceContext.from_defaults(embed_model=embed_model)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "3bfb6c2115c447e5af19bb18b5a07ebe",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Parsing documents into nodes:   0%|          | 0/172 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "55c188c3d18049df933cd87bd5a49ed1",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Generating embeddings:   0%|          | 0/428 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1min 27s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)\n"
     ]
    }
   ],
   "source": [
    "%%timeit -r 1 -n 1\n",
    "index = VectorStoreIndex.from_documents(\n",
    "    documents, service_context=service_context, show_progress=True\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Optimum Embeddings\n",
    "\n",
    "We can use the onnx embeddings we created earlier"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.embeddings import OptimumEmbedding\n",
    "\n",
    "embed_model = OptimumEmbedding(folder_name=\"./bge_onnx\")\n",
    "test_emeds = embed_model.get_text_embedding(\"Hello World!\")\n",
    "\n",
    "service_context = ServiceContext.from_defaults(embed_model=embed_model)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "cf49fd773c6f486f9fee323db590bcd5",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Parsing documents into nodes:   0%|          | 0/172 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "7f1f3f8a5d3140bd9982a158a6ef9b9f",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Generating embeddings:   0%|          | 0/428 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1min 9s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)\n"
     ]
    }
   ],
   "source": [
    "%%timeit -r 1 -n 1\n",
    "index = VectorStoreIndex.from_documents(\n",
    "    documents, service_context=service_context, show_progress=True\n",
    ")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "llama-index",
   "language": "python",
   "name": "llama-index"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
